Search CORE

562 research outputs found

Robust learning with implicit residual networks

Author: Reshniak Viktor
Webster Clayton
Publication venue: 'MDPI AG'
Publication date: 31/12/2020
Field of study

In this effort, we propose a new deep architecture utilizing residual blocks inspired by implicit discretization schemes. As opposed to the standard feed-forward networks, the outputs of the proposed implicit residual blocks are defined as the fixed points of the appropriately chosen nonlinear transformations. We show that this choice leads to the improved stability of both forward and backward propagations, has a favorable impact on the generalization power and allows to control the robustness of the network with only a few hyperparameters. In addition, the proposed reformulation of ResNet does not introduce new parameters and can potentially lead to a reduction in the number of required layers due to improved forward stability. Finally, we derive the memory-efficient training algorithm, propose a stochastic regularization technique and provide numerical results in support of our findings

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

A Dynamically Adaptive Sparse Grid Method for Quasi-Optimal Interpolation of Multidimensional Analytic Functions

Author: Stoyanov Miroslav K.
Webster Clayton G.
Publication venue
Publication date: 05/08/2015
Field of study

In this work we develop a dynamically adaptive sparse grids (SG) method for quasi-optimal interpolation of multidimensional analytic functions defined over a product of one dimensional bounded domains. The goal of such approach is to construct an interpolant in space that corresponds to the "best

M

-terms" based on sharp a priori estimate of polynomial coefficients. In the past, SG methods have been successful in achieving this, with a traditional construction that relies on the solution to a Knapsack problem: only the most profitable hierarchical surpluses are added to the SG. However, this approach requires additional sharp estimates related to the size of the analytic region and the norm of the interpolation operator, i.e., the Lebesgue constant. Instead, we present an iterative SG procedure that adaptively refines an estimate of the region and accounts for the effects of the Lebesgue constant. Our approach does not require any a priori knowledge of the analyticity or operator norm, is easily generalized to both affine and non-affine analytic functions, and can be applied to sparse grids build from one dimensional rules with arbitrary growth of the number of nodes. In several numerical examples, we utilize our dynamically adaptive SG to interpolate quantities of interest related to the solutions of parametrized elliptic and hyperbolic PDEs, and compare the performance of our quasi-optimal interpolant to several alternative SG schemes

arXiv.org e-Print Archive

CiteSeerX

Greedy Shallow Networks: An Approach for Constructing and Training Neural Networks

Author: Dereventsov Anton
Petrosyan Armenak
Webster Clayton
Publication venue
Publication date: 04/02/1998
Field of study

We present a greedy-based approach to construct an efficient single hidden layer neural network with the ReLU activation that approximates a target function. In our approach we obtain a shallow network by utilizing a greedy algorithm with the prescribed dictionary provided by the available training data and a set of possible inner weights. To facilitate the greedy selection process we employ an integral representation of the network, based on the ridgelet transform, that significantly reduces the cardinality of the dictionary and hence promotes feasibility of the greedy selection. Our approach allows for the construction of efficient architectures which can be treated either as improved initializations to be used in place of random-based alternatives, or as fully-trained networks in certain cases, thus potentially nullifying the need for backpropagation training. Numerical experiments demonstrate the tenability of the proposed concept and its advantages compared to the conventional techniques for selecting architectures and initializations for neural networks

arXiv.org e-Print Archive

University of Richmond